The only prerequisite for the R training is to install the latest version of R and RStudio on your computer. These should be available in the Company Portal in Entra, and shouldn’t require special permissions to install. We’ll talk about the difference between R and RStudio on the first day, but for now, just make sure they’re installed.
Your R version should be at least 4.4.3 or above to make sure everyone’s code behaves the same way. Likely the version you have is 4.5.1.
A number of packages are required to follow along with data wrangling and visualization sessions. Please try to install these in RStudio ahead of time by running the code below. If you don’t know how to run the code, open view the Running Code Screencast below for how to do this.
packages <- c("tidyverse", "ggthemes", "GGally", "RColorBrewer",
"viridis", "scales", "plotly", "patchwork")
install.packages(setdiff(packages, rownames(installed.packages())))
# Check that installation worked
library(tidyverse) # turns on all tidyverse packages
library(ggthemes)
library(GGally)
library(RColorBrewer)
library(viridis)
library(scales)
library(plotly)
library(patchwork)
Timing: The training will take place over 3 half days. Each day will run from 9 - 1 EST via MS Teams. The hour before and afternoon following training I will be available as office hours, in case there are questions that couldn’t be handled during training.
Structure: For most of the training, I will share my screen as I go through the website and then demo with live coding. Having 2 screens, one for my screen share and one for your R session, will make following along a lot easier.
Getting Help: I intentionally included people in this training I know to be kind, capable, and that will benefit from having better R skills. My hope is that this group is small enough and supportive enough that everyone feels comfortable openly asking questions and providing feedback to the group. However, if you aren’t comfortable asking questions openly, you can ask questions in the anonymous training feedback form. Additionally, if someone runs into an issue I can’t immediately troubleshoot (it happens), we may have to table it until office hours, then will discuss it with the group the next day (if relevant).
Objectives: Three days is barely enough to scratch the surface on what you can do in R. My goals with this training are to:Feedback: Finally, to help me improve this training for future sessions, please leave feedback in the training feedback form. You can submit feedback multiple times and don’t need to answer every question. Responses are anonymous.
Overall goals for the first day are:
R is a programming language that was originally developed by statisticians for statistical computing and graphics. R is free and open source. That means you will never need a paid license to use it, and you can view the underlying source code of any function and suggest fixes and improvements. Since its first official release in 1995, R remains one of the leading programming languages for statistics and data visualization, and its capabilities continue to grow.
When you install R, it comes with a simple user interface that lets you write and execute code. However, writing code in this interface is similar to writing a report in Notepad: it’s simple and straightforward, but you likely need more features than Notepad has to format your document. This is where RStudio comes in.
For more information on the history of R, visit the R Project website.This is primarily where you write code. When you create a new script or open an existing one, it displays here. In the screenshot above, there’s a script called bat_data_wrangling.R open in the source pane. Note that if you haven’t yet opened or created a new script, you won’t see this pane until you do.
The source pane color-codes your code to make it easier to read, and detects syntax errors (the coding equivalent of a grammar checker) by flagging the line number with a red “x” and showing a squiggly line under the offending code.
When you’re ready to run all or part of your script:This is where the code actually runs. When you first open RStudio, the console will tell you the version of R that you’re running (should be R 4.4.1 or greater).
While most often you’ll run code from a script in the source pane, you can also run code directly in the console. Code in the console won’t get saved to a file, but it’s a great way to experiment and test out lines of code before adding them to your script in the source pane. The console is also where errors appear if your code breaks. Deciphering errors can be a challenge that gets easier over time. Googling errors is a good place to start.File organization is an important part of being a good coder. Keeping code, input data, and results together in one place will protect your sanity and the sanity of the person who inherits the project. R Studio projects help with this. Creating a new R Studio project for each new code project makes it easier to manage settings and file paths.
Before we create a project, take a look at the Console tab.
Notice that at the top of the console there is a folder path. That path
is your current working directory.
<img src = “./images/WorkingDir_NoProject.png” alt = ‘Default working
directory’ width:60%;>
If you refer to a file in R using a relative path, for example
./data/my_data_file.csv, R will look in your current
working directory for a folder called data containing a
file called my_data_file.csv.
Note the use of forward slashes instead of back slashes for file paths. You can either use a forward slash (/) or a double back slash for file paths. The paths below are equivalent and the full file path the relative path above is specifying.
# forward slash file path approach
"C:/Users/KMMiller/OneDrive = DOI/data/"
## [1] "C:/Users/KMMiller/OneDrive = DOI/data/"
# backward slash file path approach
"C:\\Users\\KMMiller\\OneDrive = DOI\\data\\"
## [1] "C:\\Users\\KMMiller\\OneDrive = DOI\\data\\"
Using relative paths is a helpful because the full path will be specific
to your computer and likely won’t work on a different computer. But
there’s no guarantee that everyone has the same default R working
directory. This is where projects come in. Projects package all of your
code, data, output, etc. into a file type that is easily transferrable
to other machines regardless of file location.
imd_r_intro. Next, you’ll select
what folder to keep your project folder in. Documents/R is
a good place to store all of your R projects but it’s up to you. When
you are done, click on Create Project.
If you successfully started a project named
imd_r_intro, you should see it listed at the very top right
of your screen. As you start new projects, you’ll want to check that
you’re working in the right one before you start coding. Take a look at
the Console tab again. Notice that your current working directory
is now your project folder. When you look in the Files tab of the
bottom right pane, you’ll see that it also defaults to the project
folder.
We also want to create a folder called “data”, where we will store
datasets we’re using for this class. To do that, you can either go to
Windows Explorer and add a new folder, or you run the code below. As
long as you’re working within your project (project name should be at
the top right of window), a folder named data will appear within your
project. You can check that it worked by using the
list.files() function, which lists everything in the
working directory of your project.
Create data folder
dir.create("data")
list.files() # you should see a data folder listed
day_1_script.R. Make sure you are working in the
imd_r_intro project that you just created. Click on the
New File icon
day_1_script.R.
We’ll start with something simple. Basic math in R is pretty straightforward and the syntax is similar to simply using a graphing calculator. You can use the examples below or come up with your own. Even if you’re using the examples, try to actually type the code instead of copy-pasting - you’ll learn to code faster that way.
To run a single line of code in your script, place your cursor anywhere in that line and press CTRL+ENTER (or click the Run button in the top right of the script pane). To run multiple lines of code, highlight the lines you want to run and hit CTRL+ENTER or click Run.
To leave notes in your script, use the hashtag/pound sign (#). This will change the color of text that R reads as a comment and doesn’t run. Commenting your code is one of the best habits you can form. Comments are a gift to your future self and anyone else who tries to use your code.
Type code below in your script and run each line
# By using this hashtag/pound sign, you are telling R to ignore the text afterwards. This is useful for leaving annotation of notes or instructions for yourself, or someone else using your code
# try this line to generate some basic text and become familiar with where results will appear:
print("Hello, lets do some basic math. Results of operations will appear here")
## [1] "Hello, lets do some basic math. Results of operations will appear here"
# one plus one
1+1
## [1] 2
# two times three, divided by four
(2*3)/4
## [1] 1.5
# basic mathematical and trigonometric functions are fairly similar to what they would be in excel
# calculate the square root of 9
sqrt(9)
## [1] 3
# calculate the cube root of 8 (remember that x^(1/n) gives you the nth root of x)
8^(1/3)
## [1] 2
# get the cosine of 180 degrees - note that trig functions in R expect angles in radians
# also note that pi is a built-in constant in R
cos(pi)
## [1] -1
# calculate 5 to the tenth power
5^10
## [1] 9765625
Notice that when you run a line of code, the code and the result appear
in the console. You can also type code directly into the console, but it
won’t be saved anywhere. As you get more comfortable with R, it can be
helpful to use the console as a “scratchpad” for experimenting and
troubleshooting. For now, it’s best to err on the side of saving your
code as a script so that you don’t accidentally lose useful work.
Occasionally, it’s enough to just run a line of code and display the result in the console. But typically our code is more complex than adding one plus one, and we want to store the result and use it later in the script. This is where variables come in. Variables allow you to assign a value (whether that’s a number, a data table, a chunk of text, or any other type of data that R can handle) to a short, human-readable name. Anywhere you put a variable in your code, R will replace it with its value when your code runs. Variables are also called objects in R.
R uses the <- symbol for variable assignment. If
you’ve used other programming languages, you may be tempted to use
= instead. It will work, but there are subtle differences
between <- and =, so you should get in the
habit of using <-.
R is case-sensitive. So if you name one object treedata
and another Treedata or TREEDATA, R will
interpret these all as unique objects. While you can do things like
this, it’s best practice not to use the same name for different objects,
as it makes code difficult to follow.
Type code below to assign values to variables named a and b
# the value of 12.098 is assigned to variable 'a'
a <- 12.098
# and the value 65.3475 is assigned to variable 'b'
b <- 65.3475
# we can now perform whatever mathematical operations we want using these two variables without having to repeatedly type out the actual numbers:
a*b
## [1] 790.5741
(a^b)/((b+a))
## [1] 7.305156e+68
sqrt((a^7)/(b*2))
## [1] 538.7261
In the code above, we assign the variables a and
b once. We can then reuse them as often as we want. This is
helpful because we save ourselves some typing, reduce the chances of
making a typo somewhere, and if we need to change the value of
a or b, we only have to do it in one
place.
Also notice that when you assign variables, you can see them listed in your Environment tab (top right pane). Remember, everything you see in the environment is just in R’s temporary memory and won’t be saved when you close out of RStudio.
All of the examples you’ve seen so far are fairly contrived for the sake of simplicity. Let’s take a look at some code that everyone here will make use of at some point: reading data from a CSV.It’s hard to get very far in R without making use of functions. Think of a function as a programmed task that takes some kind of input (the argument(s)) from the user and outputs a result (the return value).
Note the difference in how RStudio color codes what it thinks are functions. There are a lot of pre-programmed functions in base R, which is what comes along with R when you install R. Installing R packages will add additional functions. You can also build your own. Names that R recognizes as a function are color coded differently than what R recognizes as text, numbers, etc. It’s also good practice to not use existing functions as new object names.
Commonly used base R functions include:mean(): calculate the mean of a set of numbers
min(): calculate the minimum of a set of numbers
max(): calculate the maximum of a set of numbers
range(): calculate the min and max of a set of numbers
sd(): calculate the standard deviation of set of numbers
sqrt(): calculate the square root of a value
Calculate mean and range to see how functions work
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# equivalent to x <- 1:10
# bad coding
#mean <- mean(x)
# good coding
mean_x <- mean(x)
mean_x
## [1] 5.5
range_x <- range(x)
range_x
## [1] 1 10
Most of the work we do in R relies on one or more existing datasets that we want to query or summarize, rather than creating our own in R. Importing data in R is therefore an important skill. R can import just about any data type, including CSV and MS Excel files. You can also import tables from MS Access and SQL databases using ODBC drivers. That’s beyond the scope of this class, but I can share examples for anyone needing to import from a database. For now, I’ll show how to work with CSVs and Excel spreadsheets.
We use the read.csv() function to import CSVs in R. The
read.csv() function takes the file path or url to the CSV
as input and outputs a data frame containing the data from the CSV. Here
we’re going to read a CSV from a website, then save that in the data
folder of our project.
Run the following line to import a teaching ACAD wetland dataset from the github repository for this training
# read in the data from ACAD_wetland_data_clean.csv and assign it as a dataframe to the variable "ACAD_wetland"
ACAD_wetland <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_wetland_data_clean.csv"
)
View the data in a separate window by running the View()
function.
# View the ACAD_wetland data frame we just created
View(ACAD_wetland)
Or, check out the first few or last few records in your console.
# Look at the top 6 rows of the data frame
head(ACAD_wetland)
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## 6 SEN-01 Sentinel Carex exilis coastal sedge 2011 60 6.60
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
## 6 FALSE FALSE 574855.5 4911909
# Look at the bottom 6 rows of the data frame
tail(ACAD_wetland)
## Site_Name Site_Type Latin_Name
## 503 RAM-05 RAM Vaccinium oxycoccos
## 504 RAM-05 RAM Vaccinium vitis-idaea
## 505 RAM-05 RAM Viburnum nudum var. cassinoides
## 506 RAM-05 RAM Viburnum nudum var. cassinoides
## 507 RAM-05 RAM Xyris montana
## 508 RAM-05 RAM Xyris montana
## Common Year PctFreq Ave_Cov Invasive Protected X_Coord
## 503 small cranberry 2012 100 0.04 FALSE FALSE 553186
## 504 lingonberry 2017 25 0.02 FALSE FALSE 553186
## 505 northern wild raisin 2017 100 0.84 FALSE FALSE 553186
## 506 northern wild raisin 2012 100 63.00 FALSE FALSE 553186
## 507 northern yellow-eyed-grass 2017 50 0.44 FALSE FALSE 553186
## 508 northern yellow-eyed-grass 2012 50 1.24 FALSE FALSE 553186
## Y_Coord
## 503 4899764
## 504 4899764
## 505 4899764
## 506 4899764
## 507 4899764
## 508 4899764
Now write the csv to disk and show how to import from your computer.
# Write the data frame to your data folder using a relative path.
# By default, write.csv adds a column with row names that are numbers. I don't
# like that, so I turn that off.
write.csv(ACAD_wetland, "./data/ACAD_wetland_data_clean.csv", row.names = FALSE)
Make sure the writing to disk worked by importing the CSV from your computer
# Read the data frame in using a relative path
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Equivalent code to read in the data frame using full path on my computer, but won't match another user.
ACAD_wetland <- read.csv("C:/Users/KMMiller/OneDrive - DOI/NETN/R_Dev/IMD_R_Training_2026/data/ACAD_wetland_data_clean.csv")
We’ll get very familiar with data frames in this class, but for the moment just know that it’s a rectangular table of data with rows and columns. Data frames are typically organized with rows being records or observations (e.g. sampling locations, individual critters, etc.), and columns being variables that characterize those observations (e.g., species, size, date collected, x/Y coordinates). Once you have read the data in, you can take a quick look at its structure by typing the name of the variable it’s stored in.
Base R does not have a way to import MS Excel files. The first step
for working with Excel files (i.e., files with .xls or .xlsx
extensions), therefore, is to install the readxl package to
import .xlsx files and writexl to write files to .xlsx. The
readxl package has a couple of options for loading Excel
spreadsheets, depending on whether the extension is .xls, .xlsx, or
unknown, along with options to import different worksheets within a
spreadsheet.
The code below installs the required packages, loads them, then first writes the ACAD_wetland CSV we just imported to an .xlsx. The last step imports the .xslx version of the ACAD wetland data.
install.packages("readxl") # only need to run once.
install.packages("writexl")
library(writexl) # saving xlsx
library(readxl) # importing xlsx
read_xlsx() function can’t read
from a url like read.csv() can.
write_xlsx(ACAD_wetland, "./data/ACAD_wetland_data_clean.xlsx")
ACAD_wetxls <- read_xlsx(path = "./data/ACAD_wetland_data_clean.xlsx", sheet = "Sheet1")
head(ACAD_wetxls)
## # A tibble: 6 × 11
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov Invasive Protected
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 SEN-01 Sentinel Acer rubr… red m… 2011 0 0.02 FALSE FALSE
## 2 SEN-01 Sentinel Amelanchi… servi… 2011 20 0.02 FALSE FALSE
## 3 SEN-01 Sentinel Andromeda… bog r… 2011 80 2.22 FALSE FALSE
## 4 SEN-01 Sentinel Arethusa … drago… 2011 40 0.04 FALSE TRUE
## 5 SEN-01 Sentinel Aronia me… black… 2011 100 2.64 FALSE FALSE
## 6 SEN-01 Sentinel Carex exi… coast… 2011 60 6.6 FALSE FALSE
## # ℹ 2 more variables: X_Coord <dbl>, Y_Coord <dbl>
We’re going to take a little detour into data structures at this point. It’ll all tie back in to our tree data.
The data frame we just examined is a type of data structure. A data structure is what it sounds like: it’s a structure that holds data in an organized way. There are multiple data structures in R, including vectors, lists, arrays, matrices, data frames, and tibbles (more on this unfortunately-named data structure later). Today we’ll focus on vectors and data frames.
Vectors are the simplest data structure in R. You can think of vectors as one column of data in an Excel spreadsheet, and the elements are each row in the column. Here are some examples of vectors:
digits <- 0:9 # Use x:y to create a sequence of integers starting at x and ending at y
digits
## [1] 0 1 2 3 4 5 6 7 8 9
is_odd <- rep(c(FALSE, TRUE), 5) # Use rep(x, n) to create a vector by repeating x n times
is_odd
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
shoe_sizes <- c(7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5)
shoe_sizes
## [1] 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5
favorite_birds <- c("black-capped chickadee", "dark-eyed junco", "golden-crowned kinglet")
favorite_birds
## [1] "black-capped chickadee" "dark-eyed junco" "golden-crowned kinglet"
Note the use of c(). The c() function
stands for combine, and it combines elements into a single
vector. The c() function is a fairly universal way to combine multiple
elements in R, and you’re going to see it over and over.
Let’s play around with vectors a little more. We can use
is.vector() to test whether something is a vector. We can
get the length of a vector with length(). Note that single
values in R are just vectors of length one.
is.vector(digits) # Should be TRUE
## [1] TRUE
is.vector(favorite_birds) # Should also be TRUE
## [1] TRUE
length(digits) # Hopefully this is 10
## [1] 10
length(shoe_sizes)
## [1] 10
# Even single values in R are stored as vectors
length_one_chr <- "length one vector"
length_one_int <- 4
is.vector(length_one_chr)
## [1] TRUE
is.vector(length_one_int)
## [1] TRUE
length(length_one_chr)
## [1] 1
length(length_one_int)
## [1] 1
In the examples above, each vector contains a different type of data.
digits contains integers, is_odd contains
logical (true/false) values, favorite_birds contains text,
and shoe_sizes contains decimal numbers. That’s because a given vector
can only contain a single type of data.
In R, there are four data types that we will typically encounter:
"hello", "3",
"R is my favorite programming language")23,
3.1415)L to it or use
as.integer() (e.g. 5L,
as.integer(30)).TRUE,
FALSE). Note that TRUE and FALSE
must be all-uppercase.There are two more data types, complex and raw, but you are unlikely to encounter these so we won’t cover them here.
You can use the class() function to get the data type of
a vector:
class(favorite_birds)
## [1] "character"
class(shoe_sizes)
## [1] "numeric"
class(digits)
## [1] "integer"
class(is_odd)
## [1] "logical"
If you need to access a single element of a vector, you can use the
syntax my_vector[x] where x is the element’s
index (the number corresponding to its position in the vector).
You can also use a vector of indices to extract multiple elements from
the vector. Note that in R, indexing starts at 1
(i.e. my_vector[1] is the first element of
my_vector). If you’ve coded in other languages, you may be
used to indexing starting at 0.
second_favorite_bird <- favorite_birds[2]
second_favorite_bird
## [1] "dark-eyed junco"
top_two_birds <- favorite_birds[c(1,2)]
top_two_birds
## [1] "black-capped chickadee" "dark-eyed junco"
Logical vectors can also be used to subset a vector. The logical vector must be the length of the vector you are subsetting.
odd_digits <- digits[is_odd]
odd_digits
## [1] 1 3 5 7 9
Let’s revisit our wetland data frame. We’ve explored the data frame
as a whole, but it’s often useful to look at one column at a time. To do
this, we’ll use the $ syntax:
See list of all sites and species in the wetland data (output truncated at 10 records)
ACAD_wetland$Site_Name
ACAD_wetland$Latin_Name
## [1] "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01"
## [9] "SEN-01" "SEN-01"
## [1] "Acer rubrum" "Amelanchier"
## [3] "Andromeda polifolia" "Arethusa bulbosa"
## [5] "Aronia melanocarpa" "Carex exilis"
## [7] "Chamaedaphne calyculata" "Drosera intermedia"
## [9] "Drosera rotundifolia" "Empetrum nigrum"
You can also use square brackets [] to access data frame
columns. Square brackets are base R’s way to view different subsets of
your data. I’m only going to touch briefly on this, so you have a basic
understanding of how to interpret square brackets. Tomorrow I’ll show
you much easier ways to subset your data using tidyverse functions.
Every data frame has 2 dimensions. The first dimension is rows and the second is columns. The code below asks for the dimensions of the ACAD_wetland data frame, and returns 508 11. That means there are 508 rows, and 11 columns. The square brackets allow you to either subset rows, columns, or both at the same time, with rows specified first and columns second.
Return data frame number of rows and columns by checking data frame dimensions
dim(ACAD_wetland)
## [1] 508 11
nrow(ACAD_wetland) # first dim
## [1] 508
ncol(ACAD_wetland) # second dim
## [1] 11
Return first 5 rows of the data frame
ACAD_wetland[1:5,]
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
ACAD_wetland[c(1, 2, 3, 4, 5),] #equivalent but more typing
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
Return first 5 rows and a subset of columns of the data frame
ACAD_wetland[1:5, c("Site_Name", "Latin_Name", "Common", "Year", "PctFreq")]
## Site_Name Latin_Name Common Year PctFreq
## 1 SEN-01 Acer rubrum red maple 2011 0
## 2 SEN-01 Amelanchier serviceberry 2011 20
## 3 SEN-01 Andromeda polifolia bog rosemary 2011 80
## 4 SEN-01 Arethusa bulbosa dragon's mouth 2011 40
## 5 SEN-01 Aronia melanocarpa black chokeberry 2011 100
CHALLENGE: How would you look at the the first 4 even rows (2, 4, 6, 8)?
ACAD_wetland[c(2, 4, 6, 8),]
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 6 SEN-01 Sentinel Carex exilis coastal sedge 2011 60 6.60
## 8 SEN-01 Sentinel Drosera intermedia spoonleaf sundew 2011 60 0.06
## Invasive Protected X_Coord Y_Coord
## 2 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 6 FALSE FALSE 574855.5 4911909
## 8 FALSE FALSE 574855.5 4911909
You can specify columns by name or by index (integer indicating
position of column). It’s almost always best to refer to columns by name
when possible because it makes your code easier to read and prevents
your code from breaking if columns get reordered. But, in case you come
across code with numbers in the column part of the brackets, here’s what
it looks like. Note the empty space to the left of the comma. That means
you want all rows, but only the first 4 columns.
Return all rows and first 4 columns of the data frame
ACAD_sub <- ACAD_wetland[ , 1:4] # works, but risky
ACAD_sub2 <-
ACAD_wetland[,c("Site_Name", "Site_Type", "Latin_Name", "Common")] #same result, but better
# compare the two data frames to the original
head(ACAD_wetland)
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## 6 SEN-01 Sentinel Carex exilis coastal sedge 2011 60 6.60
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
## 6 FALSE FALSE 574855.5 4911909
head(ACAD_sub)
## Site_Name Site_Type Latin_Name Common
## 1 SEN-01 Sentinel Acer rubrum red maple
## 2 SEN-01 Sentinel Amelanchier serviceberry
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry
## 6 SEN-01 Sentinel Carex exilis coastal sedge
head(ACAD_sub2)
## Site_Name Site_Type Latin_Name Common
## 1 SEN-01 Sentinel Acer rubrum red maple
## 2 SEN-01 Sentinel Amelanchier serviceberry
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry
## 6 SEN-01 Sentinel Carex exilis coastal sedge
You can do more than just subset by row numbers and column names. A couple more advanced use of brackets are below. Again, this is for exposure, like if you’re reading through a StackOverflow post. There are easier ways to subset your data in R, which we will cover on Day 2. Another important point about R is that there are often multiple ways to perform a task. The best code is code that works, is easy to follow, and is unlikely to break (e.g. uses column names instead of numbers). That still means there are typically multiple equally valid approaches. There are other ways to judge good code as you advance, but for now, meeting the 3
Filter data so only invasive species = T are returned
ACAD_wetland$Latin_Name[ACAD_wetland$Invasive == TRUE]
## [1] "Berberis thunbergii" "Berberis thunbergii" "Berberis thunbergii"
## [4] "Celastrus orbiculatus" "Rhamnus frangula" "Rhamnus frangula"
## [7] "Rhamnus frangula" "Rhamnus frangula" "Lonicera - Exotic"
ACAD_wetland[ACAD_wetland$Invasive == TRUE, "Latin_Name"] # equivalent
## [1] "Berberis thunbergii" "Berberis thunbergii" "Berberis thunbergii"
## [4] "Celastrus orbiculatus" "Rhamnus frangula" "Rhamnus frangula"
## [7] "Rhamnus frangula" "Rhamnus frangula" "Lonicera - Exotic"
Return only unique species sorted alphabetically.
sort(unique(ACAD_wetland[, "Latin_Name"]))
## [1] "Acer rubrum" "Alnus incana"
## [3] "Alnus incana++" "Amelanchier"
## [5] "Andromeda polifolia" "Apocynum androsaemifolium"
## [7] "Arethusa bulbosa" "Aronia melanocarpa"
## [9] "Berberis thunbergii" "Betula populifolia"
## [11] "Calamagrostis canadensis" "Calopogon tuberosus"
## [13] "Carex" "Carex atlantica"
## [15] "Carex exilis" "Carex folliculata"
## [17] "Carex lacustris" "Carex lasiocarpa"
## [19] "Carex limosa" "Carex magellanica"
## [21] "Carex Ovalis group" "Carex pauciflora"
## [23] "Carex stricta" "Carex trisperma"
## [25] "Carex utriculata" "Celastrus orbiculatus"
## [27] "Chamaedaphne calyculata" "Comptonia peregrina"
## [29] "Cornus canadensis" "Danthonia spicata"
## [31] "Dichanthelium acuminatum" "Doellingeria umbellata"
## [33] "Drosera intermedia" "Drosera rotundifolia"
## [35] "Dryopteris cristata" "Dulichium arundinaceum"
## [37] "Empetrum nigrum" "Epilobium leptophyllum"
## [39] "Equisetum arvense" "Eriophorum angustifolium"
## [41] "Eriophorum tenellum" "Eriophorum vaginatum"
## [43] "Eriophorum virginicum" "Eurybia macrophylla"
## [45] "Eurybia radula" "Festuca filiformis"
## [47] "Gaultheria hispidula" "Gaylussacia baccata"
## [49] "Gaylussacia dumosa" "Glyceria"
## [51] "Glyceria striata" "Ilex mucronata"
## [53] "Ilex verticillata" "Iris versicolor"
## [55] "Juncus acuminatus" "Juncus canadensis"
## [57] "Juncus effusus" "Juniperus communis"
## [59] "Kalmia angustifolia" "Kalmia polifolia"
## [61] "Larix laricina" "Lonicera - Exotic"
## [63] "Lupinus polyphyllus" "Lysimachia terrestris"
## [65] "Maianthemum canadense" "Maianthemum trifolium"
## [67] "Malus" "Melampyrum lineare"
## [69] "Monotropa uniflora" "Morella pensylvanica"
## [71] "Muhlenbergia uniflora" "Myrica gale"
## [73] "Nuphar variegata" "Oclemena nemoralis"
## [75] "Oclemena X blakei" "Onoclea sensibilis"
## [77] "Osmunda regalis" "Osmundastrum cinnamomea"
## [79] "Phleum pratense" "Picea glauca"
## [81] "Picea mariana" "Picea rubens"
## [83] "Pinus banksiana" "Pinus strobus"
## [85] "Pogonia ophioglossoides" "Populus grandidentata"
## [87] "Populus tremuloides" "Prenanthes"
## [89] "Quercus rubra" "Ranunculus acris"
## [91] "Rhamnus frangula" "Rhododendron canadense"
## [93] "Rhododendron groenlandicum" "Rhynchospora alba"
## [95] "Rosa nitida" "Rosa palustris"
## [97] "Rosa virginiana" "Rubus"
## [99] "Rubus flagellaris" "Rubus hispidus"
## [101] "Salix" "Salix petiolaris"
## [103] "Sarracenia purpurea" "Scirpus cyperinus"
## [105] "Scutellaria" "Scutellaria lateriflora"
## [107] "Solidago rugosa" "Solidago uliginosa"
## [109] "Sorbus americana" "Spiraea alba"
## [111] "Spiraea tomentosa" "Symphyotrichum novi-belgii"
## [113] "Symplocarpus foetidus" "Thelypteris palustris"
## [115] "Thuja occidentalis" "Triadenum"
## [117] "Triadenum virginicum" "Trichophorum cespitosum"
## [119] "Trientalis borealis" "Typha latifolia"
## [121] "Utricularia cornuta" "Vaccinium angustifolium"
## [123] "Vaccinium corymbosum" "Vaccinium macrocarpon"
## [125] "Vaccinium myrtilloides" "Vaccinium oxycoccos"
## [127] "Vaccinium vitis-idaea" "Veronica officinalis"
## [129] "Viburnum nudum" "Viburnum nudum var. cassinoides"
## [131] "Vicia cracca" "Viola"
## [133] "Xyris montana"
sort(unique(ACAD_wetland$Latin_Name)) # equivalent
## [1] "Acer rubrum" "Alnus incana"
## [3] "Alnus incana++" "Amelanchier"
## [5] "Andromeda polifolia" "Apocynum androsaemifolium"
## [7] "Arethusa bulbosa" "Aronia melanocarpa"
## [9] "Berberis thunbergii" "Betula populifolia"
## [11] "Calamagrostis canadensis" "Calopogon tuberosus"
## [13] "Carex" "Carex atlantica"
## [15] "Carex exilis" "Carex folliculata"
## [17] "Carex lacustris" "Carex lasiocarpa"
## [19] "Carex limosa" "Carex magellanica"
## [21] "Carex Ovalis group" "Carex pauciflora"
## [23] "Carex stricta" "Carex trisperma"
## [25] "Carex utriculata" "Celastrus orbiculatus"
## [27] "Chamaedaphne calyculata" "Comptonia peregrina"
## [29] "Cornus canadensis" "Danthonia spicata"
## [31] "Dichanthelium acuminatum" "Doellingeria umbellata"
## [33] "Drosera intermedia" "Drosera rotundifolia"
## [35] "Dryopteris cristata" "Dulichium arundinaceum"
## [37] "Empetrum nigrum" "Epilobium leptophyllum"
## [39] "Equisetum arvense" "Eriophorum angustifolium"
## [41] "Eriophorum tenellum" "Eriophorum vaginatum"
## [43] "Eriophorum virginicum" "Eurybia macrophylla"
## [45] "Eurybia radula" "Festuca filiformis"
## [47] "Gaultheria hispidula" "Gaylussacia baccata"
## [49] "Gaylussacia dumosa" "Glyceria"
## [51] "Glyceria striata" "Ilex mucronata"
## [53] "Ilex verticillata" "Iris versicolor"
## [55] "Juncus acuminatus" "Juncus canadensis"
## [57] "Juncus effusus" "Juniperus communis"
## [59] "Kalmia angustifolia" "Kalmia polifolia"
## [61] "Larix laricina" "Lonicera - Exotic"
## [63] "Lupinus polyphyllus" "Lysimachia terrestris"
## [65] "Maianthemum canadense" "Maianthemum trifolium"
## [67] "Malus" "Melampyrum lineare"
## [69] "Monotropa uniflora" "Morella pensylvanica"
## [71] "Muhlenbergia uniflora" "Myrica gale"
## [73] "Nuphar variegata" "Oclemena nemoralis"
## [75] "Oclemena X blakei" "Onoclea sensibilis"
## [77] "Osmunda regalis" "Osmundastrum cinnamomea"
## [79] "Phleum pratense" "Picea glauca"
## [81] "Picea mariana" "Picea rubens"
## [83] "Pinus banksiana" "Pinus strobus"
## [85] "Pogonia ophioglossoides" "Populus grandidentata"
## [87] "Populus tremuloides" "Prenanthes"
## [89] "Quercus rubra" "Ranunculus acris"
## [91] "Rhamnus frangula" "Rhododendron canadense"
## [93] "Rhododendron groenlandicum" "Rhynchospora alba"
## [95] "Rosa nitida" "Rosa palustris"
## [97] "Rosa virginiana" "Rubus"
## [99] "Rubus flagellaris" "Rubus hispidus"
## [101] "Salix" "Salix petiolaris"
## [103] "Sarracenia purpurea" "Scirpus cyperinus"
## [105] "Scutellaria" "Scutellaria lateriflora"
## [107] "Solidago rugosa" "Solidago uliginosa"
## [109] "Sorbus americana" "Spiraea alba"
## [111] "Spiraea tomentosa" "Symphyotrichum novi-belgii"
## [113] "Symplocarpus foetidus" "Thelypteris palustris"
## [115] "Thuja occidentalis" "Triadenum"
## [117] "Triadenum virginicum" "Trichophorum cespitosum"
## [119] "Trientalis borealis" "Typha latifolia"
## [121] "Utricularia cornuta" "Vaccinium angustifolium"
## [123] "Vaccinium corymbosum" "Vaccinium macrocarpon"
## [125] "Vaccinium myrtilloides" "Vaccinium oxycoccos"
## [127] "Vaccinium vitis-idaea" "Veronica officinalis"
## [129] "Viburnum nudum" "Viburnum nudum var. cassinoides"
## [131] "Vicia cracca" "Viola"
## [133] "Xyris montana"
We’ve already explored the wetland data a bit using
head(), str(), and View(). These
are functions that you will use over and over as you work with data in
R. Below, I’m going to show how I get to know a data set in R.
Read in example NETN tree data from url
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
Look at first few records
head(trees)
## Plot_Name ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN
## 1 MIMA-012 MIMA 12 6/16/2025 FALSE 2025 13 183385
## 2 MIMA-012 MIMA 12 6/16/2025 FALSE 2025 12 28728
## 3 MIMA-012 MIMA 12 6/16/2025 FALSE 2025 11 28728
## 4 MIMA-012 MIMA 12 6/16/2025 FALSE 2025 2 28728
## 5 MIMA-012 MIMA 12 6/16/2025 FALSE 2025 10 28728
## 6 MIMA-012 MIMA 12 6/16/2025 FALSE 2025 7 28728
## ScientificName DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 1 Pinus strobus 24.9 AS 5 <NA>
## 2 Acer rubrum 10.9 AB 5 <NA>
## 3 Acer rubrum 18.8 AS 3 <NA>
## 4 Acer rubrum 51.2 AS 3 <NA>
## 5 Acer rubrum 38.2 AS 3 <NA>
## 6 Acer rubrum 22.5 AS 4 <NA>
Look at summary of the columns
summary(trees)
## Plot_Name ParkUnit PlotCode SampleDate
## Length:164 Length:164 Min. :11.00 Length:164
## Class :character Class :character 1st Qu.:14.00 Class :character
## Mode :character Mode :character Median :16.50 Mode :character
## Mean :16.05
## 3rd Qu.:19.00
## Max. :20.00
##
## IsQAQC SampleYear TagCode TSN
## Mode :logical Min. :2025 Min. : 1.0 Min. : 19049
## FALSE:164 1st Qu.:2025 1st Qu.: 7.0 1st Qu.: 24764
## Median :2025 Median :12.5 Median : 28728
## Mean :2025 Mean :13.6 Mean : 62361
## 3rd Qu.:2025 3rd Qu.:19.0 3rd Qu.: 32929
## Max. :2025 Max. :36.0 Max. :565478
##
## ScientificName DBHcm TreeStatusCode CrownClassCode
## Length:164 Min. : 10.00 Length:164 Min. :1.000
## Class :character 1st Qu.: 13.12 Class :character 1st Qu.:3.000
## Mode :character Median : 19.00 Mode :character Median :5.000
## Mean : 25.47 Mean :4.165
## 3rd Qu.: 28.45 3rd Qu.:5.000
## Max. :443.00 Max. :6.000
## NA's :25
## DecayClassCode
## Length:164
## Class :character
## Mode :character
##
##
##
##
There’s a lot to digest from the summary results.
Look at structure of each column
str(trees)
## 'data.frame': 164 obs. of 13 variables:
## $ Plot_Name : chr "MIMA-012" "MIMA-012" "MIMA-012" "MIMA-012" ...
## $ ParkUnit : chr "MIMA" "MIMA" "MIMA" "MIMA" ...
## $ PlotCode : int 12 12 12 12 12 12 12 12 12 12 ...
## $ SampleDate : chr "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" ...
## $ IsQAQC : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ SampleYear : int 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
## $ TagCode : int 13 12 11 2 10 7 5 9 1 3 ...
## $ TSN : int 183385 28728 28728 28728 28728 28728 28728 28728 28728 28728 ...
## $ ScientificName: chr "Pinus strobus" "Acer rubrum" "Acer rubrum" "Acer rubrum" ...
## $ DBHcm : num 24.9 10.9 18.8 51.2 38.2 22.5 26.4 42.9 12.3 49 ...
## $ TreeStatusCode: chr "AS" "AB" "AS" "AS" ...
## $ CrownClassCode: int 5 5 3 3 3 4 NA NA NA NA ...
## $ DecayClassCode: chr NA NA NA NA ...
Look at unique values for DecayClassCode.
sort(unique(trees$DecayClassCode)) # sorts the unique values in the column
## [1] "1" "2" "3" "PM"
table(trees$DecayClassCode) # shows the number of records per value - very handy
##
## 1 2 3 PM
## 9 6 8 2
There are 2 records called “PM”, which stands for Permanently Missing in our forest data. We will convert PM to a blank, which R calls NA, and create a new decay class column that is converted to numeric.
Convert “PM” to blank. I will first make a copy of the data frame.
trees2 <- trees
trees2$DecayClassCode[trees2$DecayClassCode == "PM"] <- NA
trees2$DecayClassCode_num <- as.numeric(trees2$DecayClassCode)
# check that it worked
str(trees2)
## 'data.frame': 164 obs. of 14 variables:
## $ Plot_Name : chr "MIMA-012" "MIMA-012" "MIMA-012" "MIMA-012" ...
## $ ParkUnit : chr "MIMA" "MIMA" "MIMA" "MIMA" ...
## $ PlotCode : int 12 12 12 12 12 12 12 12 12 12 ...
## $ SampleDate : chr "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" ...
## $ IsQAQC : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ SampleYear : int 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
## $ TagCode : int 13 12 11 2 10 7 5 9 1 3 ...
## $ TSN : int 183385 28728 28728 28728 28728 28728 28728 28728 28728 28728 ...
## $ ScientificName : chr "Pinus strobus" "Acer rubrum" "Acer rubrum" "Acer rubrum" ...
## $ DBHcm : num 24.9 10.9 18.8 51.2 38.2 22.5 26.4 42.9 12.3 49 ...
## $ TreeStatusCode : chr "AS" "AB" "AS" "AS" ...
## $ CrownClassCode : int 5 5 3 3 3 4 NA NA NA NA ...
## $ DecayClassCode : chr NA NA NA NA ...
## $ DecayClassCode_num: num NA NA NA NA NA NA 1 3 2 3 ...
sort(unique(trees2$DecayClassCode_num))
## [1] 1 2 3
The other option would be to drop records with PM. Here we will use
the base R subset() function. You first have to tell it
which data frame you’re subsetting. Then you tell it the logic to use to
subset. In the case ! is interpreted in R as “NOT”. So DecayClassCode !=
“PM” means to keep all records where the decay code is Not equal
to PM.
Remove records with “PM” as DecayClassCode
trees3 <- subset(trees, DecayClassCode != "PM")
trees3 <- trees[trees$DecayClassCode != "PM",] #equivalent but not as easy to follow
Visualizing the data is also important to get a sense for the data and look for potential errors and outliers.
Histograms are a great start. The code below generates a basic
histogram plot of a specific column in the dataframe using the
hist() function.
Plot histogram of DBH measurements
hist(x = trees$DBHcm)
Looking
at the histogram, it looks like all of the measurements are below 100cm
except for one that’s way out in 400 range. You can also make a
scatterplot of the data. If you only specify one column, the x axis will
be the row number for each record, and the y axis will be the specified
column.
Make point plot of DBH measurements
plot(trees$DBHcm)
Again,
you can see there’s one value that’s greater than all of the others.
We can also plot two variables in a scatterplot.
Make scatterplot of crown class vs. DBH measurements
plot(trees$DBHcm ~ trees$CrownClassCode)
plot(DBHcm ~ CrownClassCode, data = trees) # equivalent but cleaner axis titles
Again,
you can see there’s one value that’s greater than all of the others, and
it’s crown class code 3 (codominant).
CHALLENGE: Using the skills you just learned, find the DBH record that’s > 400cm DBH.
There are multiple ways to do this. Two examples are below.
Option 1. View the data and sort by DBH.
View(trees)
Option 2. Find the max DBH value and subset the data frame
max_dbh <- max(trees$DBHcm, na.rm = TRUE)
trees[trees$DBHcm == max_dbh,]
## Plot_Name ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN
## 26 MIMA-016 MIMA 16 6/17/2025 FALSE 2025 1 19447
## ScientificName DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 26 Quercus velutina 443 AS 3 <NA>
CHALLENGE: Using the skills you just learned, what is the value of the largest DBH, and which record does it belong to?
There are multiple ways to do this. Two examples are below.
Option 1. View the data and sort by DBH.
View(trees)
Option 2. Find the max DBH value and subset the data frame
max_dbh <- max(trees$DBHcm, na.rm = TRUE)
max_dbh #443
## [1] 443
trees[trees$DBHcm == max_dbh,]
## Plot_Name ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN
## 26 MIMA-016 MIMA 16 6/17/2025 FALSE 2025 1 19447
## ScientificName DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 26 Quercus velutina 443 AS 3 <NA>
# Plot MIMA-016, TagCode = 1.
Now let’s say that you looked at the datasheet, and the actual DBH for that tree was 44.3 instead of 443.0. You can change that value in the original CSV by hand. But even better is to document that change in code. I always create a new data frame when I modify the original data frame, so I can always refer back to the original while I’m coding. I also use a pretty specific filter to make sure I’m not accidentally changing other data.
Replace 443 with 44.3 in code
# create copy of trees data
trees_fix <- trees
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == "443"] <- 44.3
CHALLENGE: How would you check that the line of code above worked?
There are multiple ways to do this. Two examples are below.
Option 1. Show the range of the original and fixed data frames
range(trees$DBHcm)
## [1] 10 443
range(trees_fix$DBHcm)
## [1] 10.0 81.5
Option 2. Plot a histogram of the original and fixed data frames
hist(trees$DBHcm)
hist(trees_fix$DBHcm)
Option 3. Calculate max of DBHcm column
max(trees$DBHcm)
## [1] 443
max(trees_fix$DBHcm)
## [1] 81.5
There are a number of options to get help with R. If you’re trying to
figure out how to use a function, you can type ?function_name. For
example ?plot will show the R documentation for that
function in the Help panel.
Get help for the functions below
?plot
?dplyr::filter
You can also press F1 while the cursor is on a function name to access
the help for that function. Help documents in R are standardized to help
you find what you’re looking for.
Great online resources to find answers to questions include Stackexchange, and Stackoverflow. Google searches are usually my first step, and I include “in R” and the package name (if applicable) in every search related to R code. If you’re troubleshooting an error message, copying and pasting the error message verbatim into a search engine often helps.
Don’t hesitate to reach out to colleagues for help as well! If you are stuck on something and the answers on Google are more confusing than helpful, don’t be afraid to ask a human. Every experienced R programmer was a beginner once, so chances are they’ve encountered the same problem as you at some point. There is an R-focused Data Science Community of Practice for I&M folks, which anyone working in R (regardless of experience!) is invited and encouraged to join.
Unmatched parenthesis
mean_x <- mean(c(1, 3, 5, 7, 8, 21) # missing closing parentheses
mean_x <- mean(c(1, 3, 5, 7, 8, 21)) # correct
Unmatched quotes
birds <- c("black-capped chickadee", "golden-crowned kinglet, "wood thrush",) # missing quote after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
Missing a comma between elements
birds <- c("black-capped chickadee", "golden-crowned kinglet" "wood thrush") # missing comma after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
Misspelled function name
x_mean <- maen(x) # mispelled mean
x_mean <- mean(x) # Corrected
Incorrect use of dimensions with brackets
# Missing comma to indicate subsetting rows (records)
ACAD_wetland2 <- ACAD_wetland[!is.na(ACAD_wetland$Site_Name)]
## Error in `[.data.frame`(ACAD_wetland, !is.na(ACAD_wetland$Site_Name)): undefined columns selected
# Correct
ACAD_wetland2 <- ACAD_wetland[!is.na(ACAD_wetland$Site_Name), ]
There’s a lot of great online material for learning new applications of R. The ones we’ve used the most are listed below.